Problems in Evaluating Grammatical Error Detection Systems

نویسندگان

  • Martin Chodorow
  • Markus Dickinson
  • Ross Israel
  • Joel R. Tetreault
چکیده

Many evaluation issues for grammatical error detection have previously been overlooked, making it hard to draw meaningful comparisons between different approaches, even when they are evaluated on the same corpus. To begin with, the three-way contingency between a writer’s sentence, the annotator’s correction, and the system’s output makes evaluation more complex than in some other NLP tasks, which we address by presenting an intuitive evaluation scheme. Of particular importance to error detection is the skew of the data – the low frequency of errors as compared to non-errors – which distorts some traditional measures of performance and limits their usefulness, leading us to recommend the reporting of raw measurements (true positives, false negatives, false positives, true negatives). Other issues that are particularly vexing for error detection focus on defining these raw measurements: specifying the size or scope of an error, properly treating errors as graded rather than discrete phenomena, and counting non-errors. We discuss recommendations for best practices with regard to reporting the results of system evaluation for these cases, recommendations which depend upon making clear one’s assumptions and applications for error detection. By highlighting the problems with current error detection evaluation, the field will be better able to move forward.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the Impact of Morphosyntactic Ambiguity in Grammatical Error Detection

We present a study of the impact of morphological and syntactic ambiguity in the process of grammatical error detection. We will present three different systems that have been devised with the objective of detecting grammatical errors in Basque and will examine the influence of ambiguity in their results. We infer that the ambiguity rate in the input to an error detection tool can have a consid...

متن کامل

Evaluating performance of grammatical error detection to maximize learning effect

This paper proposes a method for evaluating grammatical error detection methods to maximize the learning effect obtained by grammatical error detection. To achieve this, this paper sets out the following two hypotheses — imperfect, rather than perfect, error detection maximizes learning effect; and precisionoriented error detection is better than a recall-oriented one in terms of learning effec...

متن کامل

Towards a standard evaluation method for grammatical error detection and correction

We present a novel evaluation method for grammatical error correction that addresses problems with previous approaches and scores systems in terms of improvement on the original text. Our method evaluates corrections at the token level using a globally optimal alignment between the source, a system hypothesis, and a reference. Unlike the M Scorer, our method provides scores for both detection a...

متن کامل

There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction

Current methods for automatically evaluating grammatical error correction (GEC) systems rely on gold-standard references. However, these methods suffer from penalizing grammatical edits that are correct but not in the gold standard. We show that reference-less grammaticality metrics correlate very strongly with human judgments and are competitive with the leading reference-based evaluation metr...

متن کامل

JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding. We describe the types of corrections made and bench...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012